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Abstract. We introduce graphical time series models for the analysis of dynamic 
relationships among variables in multivariate time series. The modelling approach 
is based on the notion of strong Granger causality and can be applied to time series 
with non-linear dependencies. The models are derived from ordinary time series 
models by imposing constraints that are encoded by mixed graphs. In these graphs 
each component series is represented by a single vertex and directed edges indicate 
possible Granger-causal relationships between variables while undirected edges are 
used to map the contemporaneous dependence structure. We introduce various 
notions of Granger-causal Markov properties and discuss the relationships among 
them and to other Markov properties that can be applied in this context. 
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1. Introduction 

Graphical models have become an important tool for the statistical analysis of 
complex multivariate data sets, which are now increasingly available in many scien- 
tific fields. The key feature of these models is to merge the probabilistic concept of 
conditional independence with graph theory by representing possible dependencies 
among the variables of a multivariate distribution in a graph. This has led to simple 
graphical criteria for identifying the conditional independence relations that are im- 
plied by a model associated with a given graph. Further important advantages of the 
graphical modelling approach are statistical efficiency due to parsimonious parame- 
terizations of the joint distribution of the variables and the visualization of complex 
dependence structures, which allows an intuitive understanding of the interrelations 
among the variables and, thus, facilitates the communication of statisti cal results. 



For an introduct i on to grap hical models we refer to th e monographs by IWhittaker 



( 199nh . lEdward3 (|2n00). and lCox and Wermuthl |l 996h- a mathematically more rig- 



orous treatment can be found in iLauritzenl (|l996l ). 

While graphical models originally have been developed for variables that are sam- 
pled with independent replications, they have been applied more recently also to the 
analysis of time dependent data. Some first general remarks concerning the potential 



The paper was written while the author was working at the Institut fur Angewandte Mathe- 
matik, Universitat Heidelberg, Germany. 

E-mail address: m.eichler@ke.unimaas.nl (M. Eichler). 

1 



2 



MICHAEL EICHLER 



use of graphical models in time series analysis can be found in lBrillinger (1996); since 
then there has been an increasing interest in the use of graphical modelling tech- 
niques for analyzing multivariate time series (e.g. JStanghellini and Whittaken 1999, 



Dahlhandl2000UReale and Timnicliffe Wilsonll2Q0lUDahmaus and Eichlerll2003UOxlev et al 
2004 iMoneta and SpirtesT koOS. lEichlerl 12006a b). However, all these works have 
been restricted to the analysis of linear interdependencies among the variables 
whereas the recent trend in time series an alysis has shifted towards non- linear para- 
metr ic and non-parametric models (e.g.. iTond Il99al iRothmanl Il999l iFan and Yaol 
2003|). Moreover, in most of these approaches, the variables at different time points 
are represented by separate nodes, which leads to graphs with theoretically infinitely 
many vertices for which no rigorous theory exists so far. 

In this paper, we present a general approach for graphical modelling of multi- 
variate stationary time series, which is based on simple graphical representations 
of the dynamic dependencies of a process. To this end, w e utilize the concept of 
strong Granger causality (e.g.. iFlorens and Mouchartl ll982h which is formulated in 
terms of conditional independencies and, thus, can be applied to model arbitrary 
non-linear relationships among the variables. T he concept of Granger causality orig- 
inally has been introduced by iGrange ] JI969) and is commonly used for studying 
dynamic relationships among the variables in multivariate time series. 

For the graphical representations, we consider mixed graphs in which each vari- 
able as a complete time series is represented by a single vertex and directed edges 
indicate possible Granger-causal relationships among the variables while undirected 
edges are used to map the conte mporaneous dep endence structure. We note that 
similar graphs have been used in lEichlerl (2006a) as path diagrams for the autore- 
gressive structure of weakly stationary processes. Formally, the graphical encoding 
of the dynamic structure of a time series is achieved by a new type of Markov 
properties, which we call Granger-causal Markov properties. We introduce various 
levels, namely the pairwise, the local, the block-recursive, and the global Granger- 
causal Markov property, and discuss the relationships among them. In particular, we 
give sufficient conditions under which the various Granger-causal Markov properties 
are equivalent; such conditions allow formulating models based on a simple Markov 
property while interpreting the associated graph by use of the global Granger-causal 
Markov property. 

The paper is organized as follows. In Section El we introduce the concepts of 
Granger-causal Markov properties and graphical time series models; some examples 
of graphical time series models are presented in Sectional In Section^ we discuss 
global Markov properties, which relate certain separation properties of the graph 
to conditional independence or Granger noncausality relations among the variables 
of the process. Finally in Sectional we compare the presented graphical modelling 
approach with other approaches in the literature and discuss possible extensions. 
The proofs are technical and put into the appendix. 



2. Graphical time series models 

In graphical modelling, the focus is on multivariate statistical models for which 
the possible dependencies between the studied variables can be represented by a 
graph. In multivariate time series analysis, statistical models for a time series Xy = 
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[Xy(t)) are usually specified in terms of the conditional distribution of Xy(t + 1) 
given its past Xy(t) = (XV(s)) in order to study the dynamic relationships over 
time among the series. Thus, a time series model may be described formally as a 
family of probability kernels P from R VxlN to H v , and we write Xy ~ P if P is a 
version of the conditional probability of Xy(t + 1) given Xy(t). 

For modelling specific dependence structures , we utili z e the concept of Granger 
(non-)causality, which has been introduced by iGranger (1969) and has proved to 



be particularly useful for studying dynamic relationships in multivariate time series. 
This probabilistic concept of noncausality from a process X a to another process is 
based on studying whether at time t the next value of Xj, can be better predicted by 
using the entire information up to time t than by using the same information apart 
from the former series X a . In practice, not all relevant variables may be available 
and, thus, the notion of Granger causality clearly depends on the used informa- 
tion set. In the sequel, we use the concept of strong Granger noncausality (e.g., 
Florens and Mouchart 19821 ). which is defined in terms of conditional independence 



and u-algebras and, thus, can be used also for non-linear time series models. 

Let X v = (Xy(t)) with X v {t) = (X v (t))' veV G H v be a multivariate stationary 
stochastic process on a probability space (fl, J^", P). For A C V, we denote by 
Xa = (XA(t))tez the multivariate subprocess with components X a , a & A. The 
information provided by the past and present values of Xa at time t G Z can 
be represented by the sub-a-algebra ot^if) of & that is generated by X^(i) = 
(X A (s)) s<t . We write 5£ A = e Z) for the filtration induced by X A . This 

leads to the following definition of (strong) Granger noncausality in multivariate 
time series. 

Definition 2.1. Let A and B be disjoint subsets of V. 

(i) Xa is strongly Granger-noncausal for Xb with respect to the filtration i£y if 

& B {t+l) AL%A(t)\%y\ A (t) 

for all t G Z. This will be denoted by X4 ^> X# \3£y\. 

(ii) and are contemporaneously conditionally independent with respect to 
the filtration 3£y if 

3&(t + 1) JL ^(t + 1) I ^(t) V ^y\ {A uB)(t + 1) 

for all t G Z. This will be denoted by X A <* X B [^]. 

In the following, we will speak simply of Granger (non-) causality in the sense of 
the above definition. 

Intuitively, the dynamic relationships of a stationary multivariate time series Xy 
can be visualized by a mixed graph G = (V, E) in which each vertex v G V rep- 
resents one component X v and two vertices a and b are joined by a directed edge 
a — > b whenever X a is Granger- causal for X& or by an undirected edge a — b 
whenever X a and Xj, are contemporaneously conditionally dependent. Conversely, 
for formulating models with specific dynamic dependencies, a mixed graph G can 
be associated with a set of Granger noncausality and contemporaneous conditional 
independence constraints that are imposed on a time series model for Xy. Such a 
set of conditional independence relations encoded by a graph G is generally known 
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.c^lo, ,&X, 

Figure 2.1. Encoding of relations Xa -» Xg \3£x\ by the (a) pairwise, (b) local, 
and (c) block-recursive Granger-causal Markov property (^4 and B are indicated 
by grey and black nodes, respectively). 

as Markov property with respect to G. In the context of multivariate time series, 
graphs may encode different types of conditional independence relations, and we 
therefore speak of Granger-causal Markov properties when dealing with Granger 
noncausality and contemporaneous conditional independence relations. In the fol- 
lowing definition, pa(a) = {v G V\v — ► a G E} denotes the set of parents of a vertex 
a, while ne(a) ={o£ V\v — a G E} is the set of neighbours of a; furthermore, for 
A C V, we define pa(A) = U ag Apa(a)\y4 and ne(A) = U agj 4ne(a)\A . 

Definition 2.2 (Granger-causal Markov properties). Let G = (V, E) be a 

mixed graph. Then the stochastic process Xy satisfies 

(PC) the pairwise Granger-causal Markov property with respect to G if for all a, b G 
V with a b 

(i) a^b£E X Q ^X 6 [^], 

(ii) a — X a ^X fe [J^]; 

(LC) the local Granger-causal Markov property with respect to G if for all a G V 

(i) ^V\(pa(a)U{a}) X a [<^y], 

(ii) Xv\( ne ( a )u{ s }) X a [J£y]; 
(BC) the block-recursive Granger-causal Markov property with respect to G if for 
all subsets A of V 

(i) -Xy\(pa(A)UA) -/» Xa [<^y], 

(ii) Xy\( nc ( A ) UA ) <*> \%?\- 

Similarly, if P is a probability kernel from R V ' ><IN to H v , we say that P satisfies 
the pairwise, the local, or the block-recursive Granger-causal Markov property with 
respect to a graph G whenever the same is true for every stationary process Xy 
with Xy ~ P. 

Example 2.3. To illustrate the various Granger-causal Markov properties, we con- 
sider the graph G in Figure ETT1 Suppose that a stationary process Xy satisfies the 
pairwise Granger-causal Markov property with respect to this graph G. Then the 
absence of the edge 1 — > 4 in G implies that X x is Granger-noncausal for X 4 with 
respect to %/. Next, in the case of the local Granger-causal Markov property, we 
find that the bivariate subprocess X{ 12 } is Granger-noncausal for X4 with respect 
to Sa/ since vertex 4 has parents 3 and 5. Similarly, if Xy obeys the block-recursive 
Granger-causal Markov property, the graph encodes that X{ 1;2 } is Granger-noncausal 
for X{4 5 } with respect to i£y since pa(4, 5) = {3}. 

The block-recursive Granger- causal Markov property obviously implies the other 
two Granger-causal Markov properties and, thus, is the strongest of the three Markov 
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properties; similarly, the pairwise Granger-causal Markov property clearly is the 
weakest of the three properties. The question arises whether and under which con- 
ditions the three Granger-causal Markov properties are equivalent. In the case of 
random vectors (Y v ) ve y with values in H v , the various levels of Markov properties 
for graphical interaction models are equivalent if the distribution of Yy satisfies 

Y A ALY B \Y CUD AY A ALY C \Y BUD =► Y A ALY BuC \Y D 



for all disjoints subsets A, B, C, and D of V (jPearl and Pall 987ft . A sufficient con- 
dition for this intersection property is that the joint distribution of Yy is absolutely 
continuous wi th respect to som e product measure and has a positive and continuous 



density (e.g., Lauritzenl 19961 Prop. 3.1). The following result establishes similar 



conditions for the time series case. 

Proposition 2.4. Suppose that the following two conditions hold: 
(M) Xy = (Xy(t)) % is a stationary, strongly mixing stochastic process on some 

probability space (Q, P) taking values in H v ; 
(P) the conditional distribution p x v(t+i)\Xv(t) ^ t has a regular version that 

is almost surely absolutely continuous with respect to some product measure 
v on Pj y| with v-a.e. positive and continuous density. 

Then, for every & -measurable random variable Y and every t G TL, 

Y JL & A (t) | &Buc(t) A Y JL %B{t) | %Auc(t) ^YAL 3£ AuB {t) \ %c(t). 

With this intersection property, we obtain the following relations among the three 
Granger-causal Markov properties. 

Theorem 2.5. Suppose that Xy satisfies conditions (M) and (P). Then the three 
Granger- causal Markov properties (BC), (LC), and (PC) are related by the following 
implications: 

(BC) (LC) (PC). 
Furthermore, if Xy additionally satisfies 

X A X B [Sty] X A -» X b \36y\ VbeB, (2.1) 

then the three Granger- causal Markov properties (BC), (LC), and (PC) are equiva- 
lent. 

The theorem shows that, similarly as in the case of c hain graph models wit h the 



Andersson-Madigan-Perlman (AMP) Markov property (jAndersson et al.ll200ll h the 
pairwise and the local Granger-causal Markov property are in general not sufficiently 
strong to encode all Granger-causal relationships that hold among the components 
of a multivariate time series with respect to full information 3Gy. This suggests to 
specify graphical time series models in terms of the block-recursive Granger- causal 
Markov property. 

Definition 2.6 (Graphical time series model). Let G be a mixed graph and 
let &g be a statistical time series model given by a family of probability kernels 
P G &g from R v/xlN to H v . Then &g is said to be a graphical time series model 
associated with the graph G if, for all P G the distribution P satisfies the 

block-recursive Granger-causal Markov property with respect to G. 
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The three Granger-causal Markov properties considered so far encode only Granger 
noncausality relations with respect to the comple te informati o n SKy. The dis cussion 
of phenomena such as spurious causality (e.g., iHsiaol ll982L lEichlerl Effloah . how- 



ever, requires also the consideration of Granger-causal relationships with respect to 
partial information sets, that is, with respect to nitrations f° r subsets S of V. 
To this end, we introduce in Section 0] a global Granger-causal Markov property 
that more generally relates pathways in a graph to Granger-causal relations among 
the variables, and we establish, under conditions (M) and (P) , its equivalence to the 
block-recursive Granger-causal Markov property; this shows that the block-recursive 
Granger-causal Markov property is indeed sufficiently rich to describe the dynamic 
dependence structure in multivariate time series. 

Before we continue our discussion of Markov properties in Section 01 we illustrate 
the introduced concept of graphical time series models by a few examples. 

3. Examples 

In the previous section, graphical time series models have been defined in terms of 
the block-recursive Granger-causal Markov property. For many time series models, 
however, condition (|2.1j) in Proposition 12.51 holds, and, hence, the pairwise and the 
block-recursive Granger-causal Markov property are equivalent. This enables us to 
derive the constraints on the parameters from the pairwise Markov property. 

There are no simple conditions known that are both necessary and sufficient for 
(|2.ip . The following proposition lists some sufficient conditions that cover many 
examples, as will be shown subsequently. A counter-example that demonstrates 
that the pairwise and the block-recursive Granger-causal Markov property are in 
general not equivalent will be provided in Example 13.51 

Proposition 3.1. Suppose that Xy satisfies conditions (M) and (P) and one of the 
following conditions: 

(i) Xy is a Gaussian process; 

(ii) X v (t + 1), v G V , are mutually contemporaneously independent, that is, the 
joint conditional distribution factorizes as 

p Xy(t+l) |X V (t) = ^ vevp X v (t+l) |X V (t) Vf g Z; 

(Hi) Xy(t + 1) depends on its past only in its conditional mean, that is, 

X(t + 1) -E[X(t + l)\%y(t)] AL%y(t) VteZ. 

Then the three Granger-causal Markov properties (BC), (LC), and (PC) are equiv- 
alent. 

Example 3.2 (Vector autoregressive processes). Let Xy be a stationary vector 
autoregressive process of order p, 

X v (t) = J2Hu) x v(t-u)+e(t), e(t)~Af(0,Z), (3.1) 

u=l 

where $(it) are V x V matrices and the variance matrix E is non-singular. Then 
Xa is Granger-noncausal for Xb with respect to %y if and only if the cor responding 
entries $ba(u) vanish for all u = 1, . . . ,p (e.g., Boudiellaba et al. 19921 ) . Further- 



more, Xa and Xb are contemporaneously conditionally independent if and only if 
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the corresponding error components £a(*) and are conditionally independent 

given all remaining components £v\(AuB)(t)- For Gaussian errors, conditi onal inde- 
pend encies are given by zeros in the concentration matrix K = X -1 (e.g., Lauritzenl 



Il996l ). Consequently, the process Xy satisfies the pairwise — and by Proposition 
13. in ) also the block-recursive — Granger-causal Markov property with respect to a 
graph G = (V, E) if the following two conditions hold: 

(i) a — > b £ E $ ba (u) = Vw = 1, . . . ,p; 

(ii) a — b£ E K ab = K ba = 0. 

Thus, the graphical vector autoregressive model of order p associated with the graph 
G, denoted by VAR(p,G), is given by the set of all stationary VAR(p) processes 
whose parameters are constrained to zero according to the conditions (i) and (ii). 

Example 3.3 (Nonparametric additive models). More generally, we may also 
consider nonlinear autoregressive time series models. For example, let Xy be a 
strongly mixing stationary process given by 

Mt)= £ £™£ } (*«(*- «))+£&(*), bev, tez, 

a<=V u=l 

where are measurable real- valued functions and the errors £y(t) are independent 
and identically distributed. Furthermore, we assume that the distribution of the 
errors £y(t) has a positive and continuous density fy on H v . Then a nonparametric 
additive model of order p is given by the set & = {P m j} of regular conditional 
distributions P m j of such processes Xy. For a mixed graph G = (V,E), we now 
define as the subset of all P m j G such that 

(i) (•), u — 1, . . . ,p, are constant whenever a — > b ^ E and 

(ii) fy factorizes as fv(zy) = g(zy\{b}) h(zy\{ a }) whenever a — b ^ E. 

The second condition implies e a (t) _LL e^t) \ £y\{ a ,b}(t) and ensures that the distri- 
bution of the errors e(t) obeys the so-called pairwise Markov property with respect 
to the undirected subgraph G u of G (i.e., the subgraph G u obtained from G by 
removing all directed edges). Consequently, X a and X b are contemporaneously con- 
ditionally independent with respect to 3£y whenever the corresponding vertices a 
and b are not joined by an undirected edge in G. Similarly, the first condition implies 
that, for any process with Xy ~ P m j, the conditional distribution of X h {t) given 
the past Xy(t — 1) satisfies 

p* 6 (t)|X v (*-l) _ pX b (t)\X vx{a} (t-l) 

whenever the graph G does not contain the edge a — ► b. It follows that any process 
Xy with Xy ~ P m j G &g obeys the pairwise Granger-causal Markov property 
with respect to G. Furthermore, since Xy(t) depends on its past Xy(£ — 1) only 
in its conditional mean, it follows from Proposition I3.1f iii) that the pairwise and 
the block- recursive Granger-causal Markov properties are equivalent. Thus, &g is 
indeed the graphical nonparametric additive model of order p associated with the 
graph G. 

Example 3.4 (Binary time series). As an example with categorical data, we 
consider a binary time series model that has b een used for the identification of 
neural interactions from neural spike train data ( Brillinger 1988al fbh. Suppose that 
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the data consist of the recorded spike trains for a set of neurons, that is, of the 
sequences of firing times (T t , n ) ne]N for neurons v G V, and let X v be the binary time 
series obtained by setting X v (t) = 1 if neuron v has fired in the interval [t, t+ 1) and 
X v (t) = otherwise. Then, the interactions between the neurons can be modelled 
by the conditional probabilities 

P(X b (t) = l\3v(t-l))=$( E U ba (t)-0), (3.2) 
where <&(x) denotes the normal cumulative function, 

7b(t) 

Uba(t)= E 9ba{u)X a {t-u) (3.3) 

u=l 

measures the influence of process a on process b, and 

7 6 (t) = min {u G ¥S\X b (t - u) = l} 

is the time elapsed since the last event of process X b . Furthermore, we assume that 
the time unit has been chosen small enough such that there are no interactions among 
the neurons within one time interval, and that, consequently, the joint conditional 
probability factorizes as 

P(X v (t)=x v \3y{t-l)) = n P(X v (t) = x v \3fy(t-l)) 

for all x v e {0, 1} V . Then the pairwise and the block-recursive Granger-causal 
Markov property are equivalent by Proposition I3.1f ii) and, thus, we can use the 
former for modelling dependencies between the processes. From ()3.2|) and ()3.3|) . it 
follows that X a is Granger-noncausal for X b if and only if g ba {u) = for all «6N. 

Example 3.5. In general, the pairwise and the block-recursive Granger-causal 
Markov property are not equivalent. To demonstrate this, we consider a simple 
stationary Markov process Xy with conditional distributions 

X v {t)\$b{t-\)~M(p,Ti{t)) 

and conditional covariance matrix 

/ 1 p{t) N 
E(t) = p(t) 1 
V 1, 

We assume that the conditional correlation between Xi(t) and X 2 (t) given &y(t—l] 
depends on X 3 (t — 1) by 



Pit) 



p if |X 3 (t-l)| >c 
otherwise 



for some constants p with < \p\ < 1 and c > 0. In other words, the variables Xx 
and X 2 start becoming contemporaneously dependent once the most recent value of 
variable X3 exceeds a certain threshold. For this model, we find that, on the one 
hand, the marginal conditional distributions of X v (t) given S6y(t — 1) are standard 
normal and, thus, do not depend on &y(t — 1). This implies that the process Xy 
satisfies the pairwise Granger-causal Markov property with respect to the graph 
(a) in Figure 13.11 On the other hand, the bivariate conditional distribution of 
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Figure 3.1. Illustration of non-equivalence of pairwise and block-recursive 
Granger-causal Markov properties: the process in Example 13.51 satisfies the pair- 
wise Granger-causal Markov property with respect to the graphs in (a) and (b) 
whereas it satisfies the block-recursive Granger-causal Markov property only with 
respect to the graph in (b). 



X 2 (t)) depends on the value of X 3 (t — 1) by the conditional correlation p{t). 
Hence, Xy obeys the block-recursive Granger-causal Markov property with respect 
to the graph (b) in Figure ETTI but not with respect to the graph (a). 



4. Global Markov properties 

The interpretation of graphs describing the dependence structure of graphical 
models in general is enhanced by global Markov properties that merge the notion 
of conditional independence with a purely graph theoretical concept of separation 
allowing one to state whether two subsets of vertices are separated by a third sub- 
set of ve rtices. In th i s sect ion, we show that the concept of p-separation intro- 
duce d by Levitz et al. (12001 ) for chain graph models with the AMP Markov prop- 



erty f A ndersson et al 



2001) can be used to obtain global Markov properties in the 



present context of graphical time series models. Throughout this section we assume 
that conditions (M) and (P) in Proposition 12.41 hold. 

4.1. The global AMP Markov property 

We start with some further graphical terminology. Let G = (V, E) be a mixed graph. 
Then a path 7r between two vertices a and b in G is a sequence 7r = (ei, . . . , e n ) of 
edges ei G E such that e« is an edge between v j_i and Vi for some sequence of vertices 
t>o = a, v±, . . . ,v n = b. The vertices a and b are the endpoints of the path, while 
t>i, . . . ,v n -\ are the intermediate vertices on the path. Notice that paths may be 
self- intersecting since we do not require that the vertices Vj are distinct. 

An intermediate vertex c on a path 7r is said to be a p-collider on the path if the 
edges preceding and suceeding c on the path either have both an arrowhead at c or 
one has an arrowhead at c and the other is a line, i.e. — > c < — , — > c — , — c < — ; 
otherwise the vertex c is said to be a p-noncollider on the path. A path it between 
vertices a and b is said to be p-connecting given a set S if 

(i) every p-noncollider on the path is not in S, and 

(ii) every p-collider on the path is in S, 
otherwise we say the path is p-blocked given S. 

Definition 4.1 (p-separation). Two vertices a and b in a mixed graph G are p- 
separated given a set S if all paths between a and b are p-blocked given C. Similarly, 
two sets A and B in G are said to be p-separated given S if, for every pair a e A 
and b E B, a and b are p-separated given S. This will be denoted by A x p B \ S. 
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(a) 2 




(b) 2 



1 




O 



(c) 2 





Figure 4.1. Illustration of global AMP Markov property: (a) path between 1 
and 4 that is p-connecting given S C {2,5}; (b) path between 1 and 4 that is 
p-connecting given S — {2,3} (or {2,3,5}); (c) path between 1 and 4 that is 
p-connecting given S = {3,5} (or {3}). 



We note that th e above conditions for p-separation are simpler than those in 
Levitz et al.l ( 200lh due to the fact that we consider the larger class of all possibly 
self-intersecting paths. The equivalence of the two notions of p-separation is shown 
in Appendix |D| The following results show that the concept of p-separation can 
be applied to graphs encoding dynamic relationships in multivariate time series and 
allows reading off conditional independencies among the stochastic processes that 
are represented by the vertices in the graph. 

Lemma 4.2. Suppose that Xy satisfies the block-recursive Granger-causal Markov 
property with respect to the graph G. Then, for any disjoint subsets A, B, and S of 
V , we have 

A *\ p B | S => 5£ A {t) 1L 3£ B (t) | 3%(t) Vt e Z. 

Letting t tend to infinity, we can translate p-separation in the graph into con- 
ditional independence statements for complete subprocesses. For this, we define 
<%(oo) = Vt£w,&s{t) as the a-algebra generated by the subprocess Xg. 

Theorem 4.3. Suppose Xy satisfies the block-recursive Granger-causal Markov 
property with respect to the graph G. Then, for any disjoint subsets A, B, and 
S of V , we have 

An p B\S ^a(oo) il_ & b (oo) | J£(oo). 
We say that X satisfies the global AMP Markov property (GA) with respect to G. 



Example 4.4. For an illustration of the global AMP Markov property, we consider 
again the graph G in Figure l2~Tl In this graph, vertices 1 and 4 are not adjacent. 
Nevertheless, it can be shown that the two vertices cannot be p-separated by any 
set S C {2, 3, 5}: firstly, the path 1 < — 3 — ► 4 is p-connecting given a set S unless 
the set S contains the vertex 3 (Fig. 14.11 a). Secondly, the path 1 — > 3 — 2 < — 4 
is p-connecting given S whenever both intermediate vertices 2 and 3 belong to S 
(Fig. 14.11 b). Finally, the path 1 — > 3 < — 2 < — 4 is p-connecting given S if S 
contains vertex 3 but not 2 (Fig. 14.11 c). Thus, if Xy is a stationary process that 
obeys the block-recursive Granger-causal Markov property with respect to G, then 
the graph G does not encode that X\ and A 4 are conditionally independent given 
Xs regardless of the choice of S C {2, 3, 5}. 

Similarly, it can be shown that vertices 1 and 5 are p-separated given S = {3,4}: 
every path between 1 and 5 that contains the edge 3 — ► 5 or the subpath 3 — ► 4 < — 
5 is p-blocked by vertex 3. All other paths between 1 and 5 contain the subpath 
2 i — 4 < — 5 and, thus, are blocked by vertex 4. It follows that for every process 
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Xy that satisfies the block-recursive Granger-causal Markov property with respect 
to G the components X\ and X 5 are conditionally independent given X^ 3i y. 

4.2. The global Granger-causal Markov property 

In this section, we apply the concept of pathwise separation to the problem of 
deriving general Granger noncausality relations from mixed graphs. To motivate the 
approach, we firstly consider the graphical VAR(l) model of all trivariate stationary 
processes Xy = (Xi, X 2 , X 3 ) given by 



for i £ 2 with independent and standard normally distributed errors Sy(t), t G Z. 
The associated graph G that encodes the restrictions imposed on the model is shown 
in Figure 14.21 In this graph, the path 3 — ► 2 — > 1 is p-connecting given the empty 
set, which indicates that the components X± and X 3 are, in general, not independent 
in a bivariate analysis. However, an intuitive interpretation of the directed path 
3 — > 2 — > 1 suggests that X 3 Granger-causes X x but not vice versa if only the 
bivariate process Xr lj3 } is considered. Indeed, the block-recursive Granger- causal 
Markov property implies that 



that is, Xi is Granger-noncausal for X 3 with respect to ^{1,3}. Obviously, the p- 
separation criterion is too strong for establishing this Granger-noncausality relation- 
ship between X 3 and X± since it requires that all paths between the two vertices are 
p-blocked whereas it seems sufficient that only certain paths, namely those ending 
with an arrowhead at vertex 3, are p-blocked. 

This suggests the following definitions. A path 71 between two vertices a and b in 
G is said to be b-pointing if it has an arrowhead at the endpoint b. More generally, 
a path it between two disjoint subsets A and B is said to be B-pointing if it is 
6-pointing for some b G B. 

For the derivation of contemporaneous conditional independencies, we also need 
to consider paths with arrowheads at both endpoints; such paths 71 will be called 
bi-pointing. Furthermore, let tt = (711, . . . , ir n ) be a composition of paths 7Tj that are 
undirected or bi-pointing. Then 71 is said to be an extended bi-pointing path. In 
particular, this implies that any undirected or bi-pointing path is also an extended 
bi-pointing path; similarly, the composition 71 = (iri, 7^) of two extended bi-pointing 



Xx(t + 1) 
X 2 {t + 1) 
X 3 {t + 1) 



■ 11 X 1 (t) + 4> 12 X 2 {t)+£ 1 {t + l), 

'22A 2 (t)+0 23 X 3 (t)+£ 2 (t+l), 

> 33 X 3 (t)+e 3 (t + l) 



(4.1) 




2 




Figure 4.2. Mixed graph associated with trivariate time series in (|4.1|l . 
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paths 7Tj is again extended bi-pointing. Moreover, every extended bi-pointing path 
7r is of the form n = (ui,j3,u 2 ) for some paths U\, u 2 , and (3 of possibly length 
zero, where u\ and u 2 are undirected paths and /3 is a bi-pointing path (hence the 
term 'extended bi-pointing'). With these definitions, we define the following global 
Granger-causal Markov property, which gives a path-oriented criterion for deriving 
general Granger noncausality relations from a mixed graph. 

Definition 4.5 (Global Granger-causal Markov property). Let Xy be a sta- 
tionary process and let G = (V, E) be a mixed graph. Then Xy satisfies the global 
Granger- causal Markov property (GC) with respect to G if, for all disjoint subsets 
A, B, and S of V, the following conditions hold: 

(i) if every 5-pointing path in G between A and B is p-blocked given S U B then 

Xa Xb [<%aubus] ; 

(ii) if every extended bi-pointing path in G between A and B is p-blocked given 
A U B U S then 

X A X B [^aubus]- 

From the definition, it is immediately clear by setting S = V\(A U B) that the 
global Granger-causal Markov property entails the block-recursive Granger- causal 
Markov property. The following theorem shows that in fact, under conditions (M) 
and (P), the two Granger-causal Markov properties are equivalent; thus, the global 
Granger-causal Markov property may be employed to discuss the dynamic relation- 
ships implied by a graphical time series model defined in terms of the block-recursive 
Granger-causal Markov property. 

Theorem 4.6. Let Xy be a stationary process and let G = (V, E) be a mixed graph. 
Then Xy satisfies the block-recursive Granger- causal Markov property with respect to 
G if and only if Xy satisfies the global Granger-causal Markov property with respect 
toG. 

As a consequence of the global Granger-causal Markov property, we find that 
p-separation in the graph implies Granger noncausality in both directions and con- 
temporaneous conditional independence of the variables. 

Corollary 4.7. Suppose that the process Xy satisfies the block-recursive Granger- 
causal Markov property with respect to a mixed graph G. For disjoint subsets A, B, 
and S of V , if A and B are p-separated given S , then 

Xa~»X b [^axjbvjs], X b Xa [«^aubus], and Xa ^ X b [^aubus]- 

The following corollary summarizes the relationships between the various Markov 
properties for graphical time series models. 

Corollary 4.8. The various Granger-causal Markov properties are related as fol- 
lows: 

(GC) ^(BC) (LC) <=> (PC). 

If additionally condition (j2.1j) holds, then the four Granger- causal Markov properties 
are all equivalent. Furthermore, we have (BC) =>■ (GA) 

Proof. The corollary summarizes Theorems 12. 5^ 14. H| and 14.61 □ 
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5 i ^ ^3 *" u 5 1 ^ U 5 

Figure 4.3. Illustration of global Granger-causal Markov property: Three 4- 
pointing paths (solid lines) between 1 and 4 that are p-blocked by the set {3,4}. 

Figure 4.4. Illustration of global Granger-causal Markov property: Three ex- 
tended bi-pointing paths (solid lines) between 1 and 4 that are p-blocked by the 
set {3,4}. 

Example 4.9. For an illustration, we again consider a stationary time series Xy 
satisfying the block-recursive Granger-causal Markov property with respect to the 
graph G in Figure 12.11 In Example 14.41 we have seen that vertices 1 and 4 are 
not p-separated given S = {3}, that is, Xi and X 4 are in general not conditionally 
independent given X 3 . We now employ the global Granger-causal Markov property 
to examine the dynamic relationships between the components X\ and X 4 further. 

Firstly, we note that all 4-pointing paths between 1 and 4 are p-blocked given the 
set S = {3,4}. Three instances of such paths are depicted in Figure l4~3l while the 
first two paths are p-blocked by vertex 3, the last path is p-blocked by vertex 4 as 
an intermediate node. Straightforward considerations show that these three paths 
represent the three types of possible 4-pointing path ending either with 3 — >■ 4, with 

3 — ► 5 — ► 4, or with 2 < — 4 < — 5 — ► 4, respectively. It follows that X\ does not 
Granger-cause X 4 with respect to =^1,3,4}. 

Similarly, we can examine all extended bi-pointing paths between vertices 1 and 

4 to show that X\ and X 4 are contemporaneously conditionally independent with 
respect to ^1,3,4}. Figure [Ol shows three examples of such paths: the first two 
are p-blocked by vertex 3 (notice that on the second path, the vertex 3 is once a 
p-collider and once a p-noncollider) whereas the last path is p-blocked by vertices 
2 and 4. For similar reasons as above, these three paths are examplary for all 
extended bi-pointing paths between 1 and 4, and we conclude that X\ and X 4 are 
indeed contemporaneously conditionally independent with respect to ^1,3,4}. 

Finally, we note that every 1-pointing path between 4 and 1 must end with the 
directed edge 3 — > 1. Since this edge has a tail at vertex 3, every such path must 
be p-blocked given S = {1,3}, which implies that X 4 does not Granger-cause X\ 
with respect to ^1,3,4}. 

5. Discussion 

In this paper, we discussed a graphical modelling approach for multivariate time 
series that is based on mixed graphs in which each vertex represents one complete 
component series while the edges in the graph reflect possible dynamic interdepen- 
dences among the variables of the process. The constraints imposed by the graphs 
are formulated in terms of strong Granger noncausality and, thus, allow modelling 
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arbitrary non-linear dependencies. The graphical modelling approach can help to 
reduce the number of parameters involved in modelling high- dimensional non-linear 
time series while encoding the constraints on the parameters in a simple graph, 
which is easy to visualize and allows an intuitive understanding of the dependencies 
in the model. 

We have shown that the interpretation of these graphs, which for many models 
are built only from pairwise Granger noncausality relations, is enhanced by so- 
called global Markov properties, which relate separation properties of the graph to 
conditional independence or Granger noncausality statements about the process. In 
this paper, we have used the path-oriented concept of p-separation, which allows to 
attribute Granger-causal relationships among the variables to certain pathways in 
the graphs. 

Our objective has been to provide a general framework for modelling the dynamic 
interdependencies in multivariate time series; in particular, we focused on a simple 
graphical representation, which has been achieved by representing each component 
of a multivariate time series by a single vertex in the associated graph. The pre- 
sented approach, however, is not the only possible, a nd since the first papers on the 

application of g raphical models in time series analysis ( Lvnggaard and Waltherll993l 

Brillingerll996li . there has been an increasing interest in the topic dStanghellini and Whittaker 
1999LlDahlhauJ2000llReale and Tunnicliffe Wilsonll20niLlDahlhaus and Eichlerll200.4 
Oxlev et al.ll2004 iMoneta and Spirtesll2005l . lEichlerll2006al lbh. All these approaches 



are basically restricted to the analysis of linear interdependencies, and most of them 
represent each variable at each time point by a separate vertex in the associated 
graph. In the following, we briefly compare our approach with alternative graphical 
representations and point out possible extensions. 



Modelling processes of variables at separate time points 



A more detailed modelling of dependencies among the components of a vector time 
series can be achieved by representing each random variable X v (t) by a different 
vertex v t , s ay, in a graph G. This alternative approach has been di s cussed , for 
example. bvlReale a nd T unnicliffe Wilson! (|20mMDahlhaus and Eichkrl (1200.4 . and 
Moneta and Spirtes (2005). On the one hand, it leads to a more flexible class of 
graphical models and has the advantage that many of the concepts and methods 
that have been developed for the multivariate case carry over to the time series 
case. On the other hand, the increased flexibility leads to (sometimes much) larger 
graphs, which easily can become unwieldy and difficult to interpret, and it clearly 
also aggravates the model selection problem. Moreover, the underlying graph for 
such graphical time series models theoretically has infinitely many vertices, and it 
is not immediately clear how to prune this graph to a finite representation while 
preserving the Markov properties. 

Apart from these theoretical and practical issues, we think that a high level of 
detail as provided by these models is not always wanted nor always appropriate. We 
give two examples. Firstly, Baccala and Sameshimal ( 2001 ) proposed a frequency- 
domain approach for the discussion of Granger-causal relationships based on the con- 
cept of partial directed coherence. Although this approach still requires the fitting of 
VAR models, the identification of interactions is performed in the frequeny-domain 
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and henc e only relations on the lev e l of G ranger noncausality can be identified. The 
results in Baccala and Sameshimal ( 2001 ) were sum marized by path diagrams asso- 



ciated with the identified VAR model as discussed in Eichler ( 2006aj ). Our approach 
provides a theoretical framework for such analyses. 

Secondly, multivariate time series are often obtained by high-frequency sampling 
of continuous-time processes such as EEG-recordings or neural spike trains. Here, 
our approach yields a graphical representation of the inter relationships t hat does not 
depend (to some extent) on the sampling frequency (e.g., Eichler 2005al ). Moreover, 



many sophisticated models that have been proposed, for example, for analysing 
neural activity do not show a dependence on the past values only at specific lags. For 
instance, in the binary time series model discussed in Example 13 A\ the conditional 
distribution of X b (t) given the past history i£y(t — 1) depends on another process X a 
through the past values X a (t — 1), . . . , X a (t — Jb(t)), where %(t) is the time elapsed 
since the last event of process X b . In other words, the number of lagged variables 
X a (t — u) on which Xb(t) depends varies over time depending on the past of X b 
itself. Consequently, it seems inappropriate to break down the dependence of X a (t) 
on Xb(t) further into dependencies of X a {t) on X b {t — u) as required by the detailed 
modelling approach. 

m-separation versus p-separation 

The contemporaneous dependence structure of a process Xy can also be described 
by conditional independencies of the form 

Xi(t + l)AL^B(t+l)\^ v (t), 

in which case Xa and Xb are said to be contemporaneo usly independe nt with respect 
to This alternative approach has been studied by ( Eichler 2006ai l in the context 



of weakly stationary processes and linear dependencies. 

The most important difference between these two approaches for defining graph- 
ical time series models is that the corresponding composition and decomposition 
property 

& A {t + 1) AL& B (t + 1) | & v (t) 

3£ a {t + 1) JL %{t + 1) | 3fy(t) VaeA,VbeB ^'^ 

does not follow from conditions (M) and (P) but requires additional assumptions 
similarly to condition (j2.1|) . Furthermore, we note that only the first two conditions 
in Proposition 13.11 are sufficient for the above property (|5.1|) . Consequently, the 
class of graphical time series models for which the pairwise and the block-recursive 
Granger-causal Markov properties are equivalent would be smaller under the alter- 
native approach based on contemporaneous independence. 

Self-loops 

In this paper, we have focused on modelling and analysing the interrelationships 
in multivariate time series. Therefore, we have not considered the possibility of 
directed self-loops v — > v, which could be used to impose additional constraints of 
the form Xsit + 1) _LL ^e(t) | ^/\s(t) on a model. We note that, for a discussion of 
the dynamic interrelationships among variables, these self- loops are irrelevant. In 
fact, it can be shown that two disjoint sets A and B are p-separated given S in a 
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graph with self-loops if and only if they are also p-separated given S in the same 
graph with all self-loops removed. Similar statements can be formulated for pointing 
and extended bi-pointing paths. 

Non-stationary time series 

One of our main assumptions has been that the considered multivariate time series 
are stationary. This assumption, however, has been made mainly for the sake of 
simplicity, and the presented graphical modelling approach can be extended eas- 
ily also to the case of non-stationary time series by requiring that the Granger 
noncausality and contemporaneous conditional independence constraints encoded 
by a graph hold at all time points in an interval T C Z, say; in that case, we 
say that the time series obeys a Granger-causal Markov property with respect to 
the graph over the time interval T. This would allow to consider non-stationary 
time series models in which the pattern of dependencies remains fixed whereas the 
strength of the dependencies may change over time. An interesting extension would 
be models where also the gra phical structure changes at certain times. For instance, 
Talih and Hengartnerl (|2005l ) consider covariance selection models for multivariate 



time series where changes in the dependence structure occur at random times; this 
approach, however, does not model dynamic dependencies among the variables. Fi- 
nally, we note that, from a statistical point of view, non-stationary models are of 
much less interest than stationary models due to the involved practical problems. 

Two important issues have not been addressed in this paper. Firstly, in many ap- 
plications there is little prior knowledge about the causal relationships between the 
variables, and empirical methods have to be used to find an appropriate graphical 
model. This step of model selection is hampered by the large number of possible 
models by which an exhaustive search becomes infeasible even for moderate dimen- 
sions. Therefore, model search strategies are required to lessen the computational 
burden. 

A second issue, which is related to the problem of model selection, is the iden- 
tification of causal effects. It is clear from the definition of Granger causality that 
we may conclude from Granger causality to the existence of a causal effect only if 
all relevant variables are included in a study, wher eas the omiss ion of important 



variables can lead to spurious causalities. However, iHsiad f)1982f l noted that such 
spurious causalities may vanish if the information set is reduced. In other words, 
two processes that both satisfy the pairwise causal Markov property with respect 
to a graph G may exhibit different Granger noncausality relations with respect to 
partial information sets due to the presence or absence of spurious causalities. Some 
concepts ho w this observat ion could be exploited for causal inference have been 



discussed in Eichler (2005 



Appendix A. Conditional independence and stochastic processes 

Let (Q, J^", P) be a probability space and J^i, J^2, and ^3 sub-cr-algebras of J? ' . 
Here and elsewhere we assume for simplicity that all sub-a-algebras are completed 1 
The smallest a-algebra generated by J^U J^- is denoted as J£jV J£j. Then J^i and J^ 2 

1 Otherwise the cr-algebras on both sides of (|A.1|I need to be completed explicitly. 
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are said to be independent conditionally on J? 3 if E(X|J^2 V J^) = E(X|J^ 3 ) a.s. for 
al l real- valued, bounded, ^-measurable random variables X. Using the notation 
of lDawidl (|l979h we write JL & 2 | & 3 [P] or ^ JL J? 2 | ^3 if the reference to P 



is clear. 

Let J^j, i = 1, ... ,4 be sub-a-algebras of # '. Then the basic properties of the 
conditional independence relation are: 

(CI1) ^1 JL J? 2 I ^ 3 & 2 JL I ^ 3 (symmetry) 

(CI2) ^X^V^sl^ =^ ^1 -U- ^2 I ^4 (decomposition) 

(CI3) J^i _U_ J*2 V I .f^^V^I^V (weak union) 

(CI4) ^1 JL I ^4 and & x JL J? 3 | ^ 2 V J? 4 =^ ^1 JL ^ 2 V ^ 3 I ^4 (contraction) 

In some of the proofs in this paper, we make use of an additional property, 

(CI5) & x JL I J? 3 V ^4 and J^i JL J? 3 | J? 2 V & A ^ ^1 JL ^ 2 V ^ 3 I ^4, 



which has been called intersection property by Pearll (1988). Unlike the other ba- 
sic properties of conditional indepence, this property does not hold in general. A 
sufficient and necessary condition for (CI5) is given by 

(^v^ 4 )n(^v^) = ^4. (A.i) 

In that case, J^ 2 and ^3 a re said to be measu rably separated conditionally on ^4, 
denoted by #2 || ^3 I ^4 (|Florens et al.lll99(lh . We note that conditional measur- 



able separability preserves most, but not all, of the properties of conditional inde- 
pendence. For instance, if for some sub-a-algebras i — 1, . . . , 4 of & we have 
J ^i II &■>. I and Jga c ^2 then J^J || ^4 | # 3 . For details, we refer to Chapter 5.2 



of iFlorens etall Jl990). 



If J^i is generated by random vectors X{ for 2 = 1, ... ,4, then a sufficient con- 
dition for conditional measurable separability of the AVs and, thus, of the #i's is 
that the probability measure p x i>-'^4 j s absolutely continuous with respect to a 
product measure /1 and has a positive and continuous density. However, if each of 
the ex-algebras J^i is generated by infinitely many random variables, the condition 
is obviously no longer valid. Therefore, we show that, in the case of stationary pro- 
cesses X v , the above assumption of existence of a positive and continuous density 
need only be made for the conditional distribution of X v (t + 1) given its past Xy(t) 
if the process is strongly mixing. 

Lemma A.I. Suppose that Xy is a stationary process such that condition (P) holds, 
and let Yi, Y 2 be finite disjoint subsets of S(t) = {X v (s), s <t,v G V}. Then 

Y l || Y 2 \a(S(t)\(Y 1 UY 2 )). (A.2) 

Proof. A sufficient condition for (jA.2|) ( Florens et al.l[l99oL Corollary 5.2.11) is the 
existence of a probability measure P' on (Q, %y{t)) such that P|^,(t) and P' are 
equivalent (i.e. have the same null sets) and 

Y 1 ALY 2 \a(S(t)\(Y 1 UY 2 )) [¥']. (A.3) 

Let Zj = Xy(t — j) for j — 0, . . . , k — 1 and Z\. = S(t — k). Noting that the condi- 
tional densities fz jv \z k exist and can be derived from the product of the conditional 
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densities fzj\z j+1 ,...,z k , we define the probability kernel Q(zk, A) from R yxIN to H Vxk 
by 

Q(z k , A x . . . x =[ ■ [ 11 11 /zj V |0 fe (^kfc) ^ fc (^o, • • • , ^fc-i)- 

Then the probability P' on 3^/{t) defined by 

P'(Z j G Aj, j = 0, . . . , k) = [ Q(Z k , {dz , dz k -!)) dP 

is equivalent to P and the random variables Zj v for j = 0, . . . , k — 1 and v G V are 
mutually conditionally independent given Z k . This implies in particular (|A.3|) for k 
large enough. □ 

The next result shows that for strongly mixing processes this conditional measur- 
able separability can also be extended to a-algebras 3£ A {t) generated by the pasts 
X A (t). 

Proposition A. 2. Suppose that Xy is a stationary process such that conditions 
(M) and (P) hold. Then 3£ A {t) and 5£ B (t) are measurably separated conditionally 
on $v\(AuB)(t) for all disjoint subsets A and B ofV. 

For the proof of the proposition, we firstly show that every stationary strongly 
mixing process is also conditionally mixing. Let A C V. For any set S' = {u £ Q \ 
X A (t - k - 1) G T} in 2F A (t - k - 1), we define the shifted set S'j G 5£ A (t - k - j) 
by Sj = {u G Q | X^(t — k — j) G T}. Then the process Xy is said to be 
conditionally mixing (|Veiianenlll990l ) if for all disjoint subsets A and B of V, t G Z, 
S G <r{X A {t), . . • , X A {t - k)}, and S' G 3? A (t - k - 1) we have 

E\P(S n Sj | %B{t)) -P{S\3£ B {t))P(S' j \% B {t)) \ ^0 

as j — ► oo. 

Lemma A. 3. Lei Xy 6e a stationary process such that condition (M) holds. Then 
Xy is also conditionally mixing. 

Proof. Let F G a{X B (ti), X B (t m )} with m G N and t,- < t, 1 < j < m. Noting 
that 

E[E(1 S | & B (t))E(l s , | .%(<)) • If] = E[E(l 5n F | .%(*)) ■ 1^] 
by the law of iterated expectation by conditioning on we have for S* G 

ff{^(t), . . . , - fc)} and S' G - Jfe - 1), 

|E[{E(l Sn ^ I & B (t)) - E(l s | ^(*))E(1 S , | .%(*))} • If] | 
<|E(W;nF)-E(l5nF)E(l 5 p| 

+ |E[E( W I ^ (n) (t))]E(l 5 ,) - E[E(W I ^ {n) W) • I5J I 
+ |E[{E( l 5nF I ^B (n) (t)) - E(W I .%(*))} • Is;] I 

where 5E B (t) = a{X B (t), . . . ,X B (t — n)}. Each of the three terms on the right 
side converges to zero as j — n and n tend to infinity. Since the union of all 
a{X B (tx), . . . , X B (t m )} with m G N and tj < t constitutes a fl-stable generator 
of 3> B (t), the assertion of the lemma follows. □ 
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Proof of Proposition L"4~"gl Let A and B be disjoint subsets of V. We have to show 
that ^(t), and 3%v\(A\jB){t) satisfy ()A.1|) and hence that 

3tv\ B (t) n %y\A(t) = %V\(AuB)(t) (A.4) 

for all t G Z. From Lemma lA.ll it follows that, for all t G Z and G N, the cr- 
algebras cr{Xyi(i), . . . , fe+1)} and cr{Xs(t), . . . , Xs(t— k+1)} are measurably 

separable conditional on $&\(AuB)(t) V %y{t — k). Accordingly, we have by the 
definition of conditionally measurable separability 

(3fy\ B {t) V $v(t - k)) n (%v\A(t) V 3y(t - k)) = 3?v\(AuB)(t) V Sfy(t - k) 

for all t G Z and G N. Since the a-algebras on both sides are monotonically 
decreasing as fc increases, this yields for k —>■ oo 

fl v ^: v (t - k))n(%v\ A (t) v jr y (t - fc))] 

fc>0 

= n (^\(A UB) (t) v^(t-*)) 

fc>0 

for all t G Z. In order to establish (|A.4|) . it therefore suffices to show that 

D [3&(t) V 3fy{t - k)] = Mi) (A.5) 

for any subset S of V. Let ^ denote the tail cr-algebra on the left hand side. 
Since by Lemma fA. 31 the process Xy is conditionally mixing, the cr-algebras & and 
^y\a(t) = a{Xv\s(t), . . . ,Xy\s{t — k)} are conditionally independent given 3&s(t). 
Thus, we have for all T G ST 

E(1 T | %s(t)) = E(1 T I ^(t) V JTs(t)) -> 1 T 

as — > oo. Hence ^ C 5£s{t) whereas the converse relation is obvious. □ 

Proof of Proposition "K2 The result follows from Proposition IA.2I and the fact that 



(A.l) is a necessary and sufficient condition for (CI5). □ 

Appendix B. Graphical terminology 

We firstly recall some basic graphical definitions used in this paper. In a graph 
G = (V, E) , if there is a directed edge a — > b, we say that a is a parent of b and 
b is a child of a; similarly, if there is an undirected line a — b, the vertices a and 
b are called neighbours. The sets of parents, children and neighbours of a vertex 
a are denoted as pa(a), ch(a), and ne(a), respectively. Furthermore, for A C V, 
let pa(v4) = U ag Apa(a)\j4 be the set of all parents of vertices in A that are not 
themselves in A, and let c h (A) a nd ne(v4) be defined similarly. 

Next, as in Frvdenberd (|l990). a node b is said to be an ancestor of a if either 



b = a or there exists a directed path b — > ■ ■ ■ — > a in G. The set of all ancestors of 
element s in A is denoted by an(A). Notice that this definition differs from the one 
given in Lauritzenl (|l996l ). A subset A is called an ancestral set if it contains all its 



ancestors, that is, an(A) = A. 

Finally, let G = (V, E) and G' = (V, E') be mixed graphs. Then G' is a subgraph 
of G if V C V and E' C E. If A is a subset of V it induces the subgraph Ga = 
(A, Ea) where Ea contains all edges e G E that have both endpoints in A. 
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In the remainder of this section, we prove some auxiliarly results that allow us 
to relate separation statements in the full graph to separation statement in so- 
called marginal ancestral graphs, which basically reflect the dynamic dependencies 
in appropriate subprocesses (see Lemma fC.lJ) . 

Definition B.l (Marginal ancestral graph). Let G = (V,E) be a mixed graph 
and let A be an ancestral subset of V. Then the marginal ancestral graph G[a\ = 
(A,E[A]) induced by A is obtained from the induced subgraph Ga by insertion 
of additional undirected edges a — b whenever there exists an undirected path 
between a and b in G that does not intersect an(A)\{a, b}. 

Lemma B.2. Let G = (V,E) be a mixed graph and A, B, S disjoint subsets ofV. 
Then A and B are p-separated given S in G if and only if A and B are p-separated 
given S in G^aubus)] ■ 

Proof. To show necessity, let tt = (ei, . . . , e n ) be a p-connecting path between A and 
B given S in G[ a n(AuBus)]- If ah edges of it are edges in G, tt is also p-connecting given 
S in G. Thus, we may assume that the edges e^, . . . , e Jm in tt do not occur in G. 
These edg necessarily undirected since all directed edges in G^aubus)] also 

occur in G. Let e Jfc = vj k — v j k +i- Then by definition of the marginal ancestral 
graph there exists an undirected path Jfe between Uj fc _i and Vj k which bypasses 
a.n(AU B U S)\{vj k -i,Vj k } and therefore is p-connecting given S. Replacing all 
edges Cj k in tt by the corresponding paths <pj k we obtain a new path tt' which connects 
A and B in G. This path tt' is also p-connecting given S since the replacement of ej k 
by the undirected and p-connecting path (pj k does not change the p-collider resp. p- 
noncollider status of the nodes f Jfe -i and Vj k . 

Conversely for sufficiency, let tt — (ei, . . . , e n ) be a p-connecting path between A 
and B given S in G. Then all edges in tt with both endpoints in an(v4 U B U S) 
also occur in G^aubus)} since G^aubus) is a subgraph of G^aubuS)]- We firstly 
show that the endpoints of any directed edge ej in tt are in a,n(A U B U S). Let 
e 3 - = Vj — ► (the case e^- = Uj •< — Vj + \ is treated similarly). Then there exists a 
directed subpath {ej, . . . , ej +r ) of maximal length such that either Vj +r is an endpoint 
of tt and, thus, in A U B or ej +r+ i is of the form Vj +r — fj+r+i or Vj +r < — Vj +r+ %. 
In the latter case Vj +r is a p-collider and, thus, in S since 7r is p-connecting given S. 
It follows that Vj and are both in an(A U B U S). 

Next, if ej is an edge in it that does not occur in G[ a n(AuBus)] > a t least one of its 
endpoints t> j_i and fj is not in an(A UBU5 1 ). Thus, there exists an undirected 
subpath ip ijk = (e^, . . . , e^) with i < j < k such that Ujt G an(A U5US 1 ) but 
all intermediate points are not in an(v4 U B U S). In other words, t> j_i and are 
not separated by an(A U B U S)\{v v^} in G which implies the presence of the 
undirected edge fi^ = Vi-\ — Vk in G[ a n(AuBu5)]- Replacing all undirected subpaths 
4>i y k with intermediate points not in an(v4 UBUS) by the corresponding edge f^, 
we obtain a path between A and -B in G^aubus)] which still has all its p-collider 
in S and all its p-noncolliders outside S and therefore is p-connecting given S. □ 



The following lemma is an adapted version of Proposition 2 in iKoster (1999) 



The proof is considerably shorter due to the fact that we allow paths to be self- 
intersecting. 
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Lemma B.3. Let A, B, S be disjoint subsets ofV. Then A and B are p-separated 
given S in G^aubus)] if an d on ^V there exist subsets A' and B' such that A C A' , 
B GB', A'UB'US = an(A U B U S) and 

A' x p B'\S [G^aubus)]]- 

Proof. By Lemma IB.2I we may assume that V = a.n(A U B U S). Let A' be the 
subset of vertices v G V\(B U S) such that v x p B \ S [G], and set B' = V\(A'US). 
Then A' and B are obviously p-separated given S. Thus, we have to show that 
a and b' are p-separated given S whenever a G A' and b' G B'\B. Suppose to the 
contrary that there exists a p-connecting path tc between some a G A' and b' G B'\B. 
Since A' contains all vertices in V\(5 U 5 1 ) that are p-separated from B given 5, 
there exists a p-connecting path 7r' between 6' and some b £ B. Furthermore, since 
b' G an(A U 5 U S , )\(AUSU5') there exists some vertex u G AU-BUS" and a directed 
path uo = b — > ■ ■ ■ — > u with no intermediate vertices in A U B U S. Denoting by 
uj the reverse path of w, that is, u = u < — • • • < — b, we may compose a path 
between A and B by 

(i) = (a},7r') if v G A, 

(ii) = (it, uj) if i> G -B, and 

(hi) = (71", c<j, o), 7r') if f G S 1 . 

We note that the directed path uo is p-connecting given S since it has no intermediate 
vertices in S. Furthermore, b' ^ S is a p-noncollider on in each of these cases and 
v G S is a p-collider on in case (iii). Hence is a p-connecting path between A 
and B given S 1 which contradicts our assumption. 

The opposite implication is obvious because of the elementwise definition of p- 
separation. □ 

Because of Lemmas IB.2I and IB. 31 it is often sufficient in the proofs to consider 
only the case of A m p B | S with S = V\(A U B). In this case, p-separation can 
be characterized in terms of pure-collider paths — paths on which every intermediate 
node is a collider — or in terms of local configurations. 

Lemma B.4. Let G be a mixed graph and let A and B be two disjoint subsets of 
V . Then the following statements are equivalent: 

(i) A m p B\V\(AUB); 

(ii) A and B are not connected by a pure-collider path; 

(iii) (A U ch(A)) fl(BU ch(B)) = and ne(A U ch(A)) n (B U ch(B)) = 0. 

Proof. This observation follows directly from the definition of p-separation and pure- 
collider paths. □ 

Appendix C. Proofs 

Proof of Theorem \2.5\ Setting A = {a} in (BC), we immediately obtain (LC). Con- 
versely, since pa(a) U {a} C p&(A) U A, we have by (LC) together with (CI2) and 
(CI3) 

^y\ pa (A)uA -» X a Va G A, 
which, under condition (|2.1|) . implies the first part of (BC). The second part is 
proved similarly. 
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To see that (LC) and (PC) are equivalent, we note that, under conditons (M) and 
(P), the intersection property leads to the following composition and decomposition 
property for Granger noncausality relations: 

X A -» X B [%v] X a -» X B \36v\ Va G A. (C.l) 

Similarly, we have for contemporaneous conditional independence relations 

X A no x B \3h\ <£> X a no X b [3£ v ] Va G A, V6 G B. (C.2) 

Taking A = V\(B U pa(5)) in JOH) and A = y\(BU pa(B)) in (|(l2|l . we find that 
the pairwise and the local Granger-causal Markov property are equivalent. □ 

Proof of Proposition ^ . 11 We have to show that each of the three conditions (i), (ii), 
and (iii) implies 

X A ^X b [Sty] V6 G B X A -» X B [Sty] (C.3) 

for any two disjoint subsets A,BC.V. 

For the first case, let H be the Hilbert space of all square integrable random 
variables on (O, J^",P). Furthermore, for U C V, let Hjj{t) be the closed subspace 
spanned by {X u (s),u E U,s < t} and let H^(t) be its orthogonal complement. 
Then we have for any Y G Hy\ A (t) 

cov (Xj(t+l),y) = cov (X(t+1),Y) =0 V6 G 5, 

which for a Gaussian process implies (jG3|) . 

Next, suppose that condition (ii) holds and that X A is Granger-noncausal for X b 
with respect to i2y for all b G B. Then, the conditional distribution p x s(*+ 1 )l x v(*) 
satisfies 

pX s (t+l)]X v (t) = ^ s pXi,(t+i)|x K (t) = ^ B pX 6 (t+i)|x nA (t) 

and, thus, is <^y\^(t)-measurable, which proves ()C.3|) . 
Finally, if condition (iii) holds, we have 

X B (t + 1) -E[X B (t + 1) | 3tv{t)] AL % A (t) | & v \A{t). 

Since the left hand side of ijTTSjl implies that E[X B (t + 1) | SC v {t)\ is ^v\a(*)- 
measurable, we obtain X B (t + 1) _LL S£ A {t) \ %v\ A (t), which completes the proof. □ 

For the proofs of the equivalence of the block-recursive and the global Granger- 
causal Markov property, it will be convenient to restrict ourselves to mixed graphs for 
ancestral subsets. Due to the additional undirected edges inserted into the marginal 
ancestral graph G[ a n(A)], the subprocess X an (^) satisfies the pairwise Granger-causal 
Markov property with respect to G[ an (A)] if Xy did so with respect to G. The 
following lemma shows that the same inheritance property also holds for the block- 
recursive Granger- causal Markov property. 

Lemma C.l. Suppose that Xy satisfies the block-recursive Granger-causal Markov 
property with respect to the mixed graph G, and let U C V. Then the subprocess 
X an (u) satisfies the block-recursive Granger-causal Markov property with respect to 
the marginal ancestral graph G\ an m)]. 
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(a) (b) (c) 



FIGURE C.l. Pure-collider paths between two vertices a and b. 

Proof. Let H = G[ an tm] and let A be a subset of an([7). We firstly note that, since 
an([7) is an ancestral set and, thus, contains the parents of all its subsets A, the 
parents of A' in both graphs are the same, i.e. P = p& G (A) = pa H (A). By the block- 
recursive Granger-causal Markov property of Xy with respect to C7, X v \(puA) does 
not Granger-cause Xa with respect to which by (CI2) implies that X^u^p^a) 
is Granger-noncausal for Xa with respect to the smaller filtration 3^ n (u) as required 
by the block-recursive Granger-causal Markov property of X an ^ with respect to H. 

Next, let N = ne#(A). Then A and &n(U)\N are separated by N in H n , that 
is, a and b are not adjacent in the undirected subgraph H a whenever a G A and 
b G &n(U)\N. By definition of H, this implies that A and &n(U)\N are separated 
by iV in G n . By the block-recursive Granger- causal Markov property, it follows that 

$A.(t + 1) -LL %Ln(U)\N (t + 1) | S£ N {t + 1) V 3fy(t) 

and 

2£ A {t + 1) -LL %V\a,n(U)(t) | $%m(U)(t). 

Combining these two relations, we find that Xa and X an ^\N are contemporaneously 
conditionally independent with respect to J^wm as required by the block-recursive 
Granger-causal Markov property of -Xwm with respect to the graph H. □ 

Proof of Lemma \4-!e!\ For notational convenience, we may assume in view of Lemma 
IC. II that an(A U B U S) = V and, thus, G^^aubuS)] = G. Furthermore, Lemma IB~2l 
implies that, if A m p B \ S in the graph G, there exists a partition (A*, B*, S) of V 
such that A C A*, B C B*, and A* m p B* \ S. Thus, without loss of generality, we 
may assume that S = V\(A U B). 

With these simplifications, it suffices to show that A m p B | V\(A U B) implies 

S&At) ^ &x B (t) I ^v\(Au Bm (t) (C.4) 

for all t G 7L. To this end, we firstly show that 

ST A (t) -LL % B {t) I %r v \(AuB)(t) V ^us(t - k) (C.5) 

for all t G Z and fc G N. 

We proceed by induction on k. For A; = 1, we note that B C V\(^4 U ne(A)) and 
hence 

^(t) x ^(t) | %r v \(AuB)(t) v ^(t - l). (C.6) 

For the induction step k — > k + 1 assume that 



& A (t) -LL ^e(t) | ^ n( AuB)(t) V ^ AuB (t - fc) 



(C.7) 
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for all t 6 Z. Let C A = AUch(A). Then, since by the block-recursive Granger- causal 
Markov property Xa is Granger-noncausal for X v \c A , we have 

3r A (t) JL %V\C A (t + 1) I %V\ A (t) V %AUB(t - k) 

and further with (jC.7j) and (CI4) 

& A (t) Jl & B (t) V «T n c A (t + 1) I %V\(AUB)(t) V ^(t - k). 

With iV A = ne(A U ch(A)) = ne(CU), we obtain by (CI2) and (CI3) 

2r A (t) JL & B (t) V %v\ iCA uN A )(t + 1) | & NA (t + 1) V %v\(AuB)(t) V JT y (t - k). (C.8) 

Next, we note that by Lemma El B U ch(B) C V\(CU U N A ) and thus 

^ A (t + 1) JL & B (t) | ^ A (t + 1) V 5h\ B {t). 

Furthermore, X Ca and X v ^ CaVJNa) are contemporaneously conditionally indepen- 
dent and thus 

%c A {t + 1) JL $V\(C A UN A ) 
Together with the previous relation, we obtain by (CI4) 

3% A {t + 1) JL & B (t) V & V \(C A UN A ){t + 1) I $?N A {t + 1) V % V \B{t). 

By ()C.8|) and the intersection property (CI5), this yields 

% A V % Ca (t + 1) JL ^ (t) V ^\(C A UJV A ) (* + 1) | %N A (t + 1) V &v\{AUB) (t) V%v{t-h). 

Since this relation holds for all t G Z, we have by (CI2) and (CI3) 

^(t) JL & B (t) | «r n{ Au B )(t) V & AuB (t -k — 1), 

which completes the induction step. 

To show that (jC.5|) entails ()C.4|) . we note that for k — > oo (IC.5|) yields 

J^(t)JL^(t)| D [^V\ { AUB)(t)V^AUB(t-k)] 
k>0 

for all t G Z. As in the proof of Proposition IA.2[ it follows that 

Pi [<^V\(AUB)C0 V ^aub(^ — fc)] = ^K\(AUB)(*), 

k>0 

which concludes the proof of (|C.4|) . □ 

Proof of Theorem \4-Z\ Suppose that A, B, and S are disjoint subsets of V such that 
A ix p B\S. Let £ be any ^4(00) measurable random variable with E|£| < 00, 
where ^4(00) = V tes ^.(t) denotes the cx-algebra generated by X A . Then £(t) = 
E(C|^a(^)) is a martingale and converges to £ in L 1 as t tends to infinity. Thus, we 
obtain on the one hand, as t — > 00, 

E(e(t)i^5uB(*))^E(e|^uB(oo)) inL 1 . (C.9) 

On the other hand, since JL 3£ B {$) \ 3£s{t) by Lemma f4. 21 we have, as t — > 00, 

E(e(t)|^u B (0) = E(£(f)|.2&(*)) - E(e|^(oo)) in L 1 . (CIO) 

Since the limits in ()C.9|) and (jC.10|) must be equal in L 1 and, thus, also almost 
surely, this proves that ^(00) JL ^(00) | J%(oo). □ 
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Figure C.2. (a) bi-directed path; (b) extended bi-directed path. 

Proof of Theorem \4-b\ For the proof of the first part of the global Granger-causal 
Markov property, let A and B subsets such that all 5-pointing paths between A and 
B are p-blocked given B U C. We note that each 5-pointing path ir is of the form 
7r = (tt, e), where e is a directed edge u — > b for some b G B. Thus, n is p-blocked 
given B U C if and only if u G B U C or tx is p-blocked given B U C. Therefore, 
if all 5-pointing paths between A and B are p-blocked given B U C, then A and 
pa(.B)\(-B U C) are p-separated given B U C and we obtain by Lemma [4.21 

(b)\(bus)(*)-U-^a(0 I &kjs(t). 

Since, in particular, every edge a — > b for some a G A and b E B is p-connecting, it 
follows that A and pa(-B) are disjoint. Thus, we get by the block-recursive Granger- 
causal Markov property 

& B (t + 1) 1L %A(t) | ^(B)USUB(t). 

Applying the contraction property to this and and the previous relation, we find 
that Xa is Granger-noncausal for X B with respect to %aubus- 

For the proof of the second part, let U = A U B U S and assume that every 
extended bi-pointing path between A and B is p-blocked given U. Firstly, we note 
that a bi-directed path 7r between a G A and b G B is of the form 7r = (ei,7f, e n ), 
where t\ and e n are directed edges a < — v\ and f n _i — ^ b, respectively (Fig. IC.2l a). 
Thus, 7r is p-blocked given U if and only if v, w G U or 7T is p-blocked given [/. 
This implies that, if all bi-directed paths between A and S are p-blocked given U, 
p&(A)\U and p&(B)\U are p-separated given U . 

The argument can be adapted to the more general case of extended bi-directed 
paths: let Sa be the subset of s G S that are linked to A by an undirected 
p-connecting path and define S B similarly. Then, intuitively, pa(A U Sa)\U and 
pa(S U S B )\U should be p-separated given U (Fig. IC.2l b). 

More precisely, let So = {s G S'|pa(s) C U}, which in particular includes all 
s G S that have no parents. Further more, let Sa be the set of all s G S\Sq such 
that every extended bi-pointing path between s and B is p-blocked given U and set 
Sb = S\(So U Sa)- Notice that for all s G S B there exists an extended bi-pointing 
path between s and B that is p-connecting given U. We show that every extended 
bi-pointing path between AuSa and BUS B is p-blocked given U. Since all extended 
bi-pointing paths between AU Sa and B must be p-blocked by assumption on A 
and B or by definition of Sa, we only have to show that all ext-bi-pointing paths 
between AU Sa and S B are p-blocked given U . Suppose to the contrary that 7r is 
an extended bi-pointing path between AU Sa and s G Sb that is p-connecting given 
U. Then, as mentioned above, there exists a p-connecting extended bi-pointing 
path 7r s between s and B. If s is a p-collider on the composed extended bi-pointing 
path tt' = (7T, tt s ) then 7r' is p-connecting given U contradicting the assumption 
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about A and B. Otherwise, if s is a p-noncollider, the two adjacent edges must be 
undirected (i.e. — s — ) because extended bi-pointing paths never have a tail at 
either endpoint. Noting that there exists some v £ pa(s)\Z7 since s S , we set 
7r' = (n, s < — v — > s, tt s ). Then s is both times a p-collider on n' and, thus, ir' is 
p-connecting given U . Since n' is also an extended bi-pointing path, this contradicts 
again the assumption about A and B. 

Since in particular all bi-pointing paths between A\JSa and BUSb are p-blocked 
given U, we have 

pa(A U S A )\U M p pa(A U S B )\U | U. 
Thus, we obtain by Lemma f4. 21 

%pa,(A\JS A )\u(t) -U- %pa.(B\JS B )\U 

(t) | ^(t). (C.ll) 

Since pa(A U SU) Q V\pa(B U S'b) and pa(So) C U, the block-recursive Granger- 
causal Markov property implies 

<^BUS B US (^ + 1) -LL %pa.(AuS A )\u(t) I ^7Upa(BU5 s ) (t) (C.12) 

and further with (|C.11|) 

&BUS B L)S (t + 1) _LL ^pa,(AuS A )\U (t) | Sk{t). (C.13) 

Moreover, since undirected paths are special cases of extended bi-pointing paths, 
we find that every undirected path between A U Sa and BUSr intersects Sq. Then, 



by a standard argument of graph theory (e.g., IWhittakerlll990l Lemma 3.3.3), there 
exists a partition (A*,B*,S ) of V such that A U S A Q A*, B U C 5*, and 
every undirected path between A* and 5* intersects So; in particular, this implies 
ne(^4 U Sa) 5*0. Thus, we obtain by the block-recursive Granger-causal Markov 
property 

%AUS A (t + 1) JL ^us s (t + 1) I ^(t) V ^ (t + 1). 

Together with 

■%AUS A US (t + 1) Jl $V\(UUpa,(AuS A ))(t) I ^UUpa.(AuS A ) {t) , 

which also follows from the block-recursive Granger-causal Markov property, this 
implies 

^AusAt + 1)AL %Bus B (t + 1) I %uuMAus A )(t),X So (t + 1). (C.14) 
Applying (CI4) to ljTTT3|) and (fTHl) . we finally obtain 

%Aus A (t + 1) JL ^us B (t + 1) I Mt) V %s (t + 1), 

from which the desired relation follows by (CI2). 

Finally, to see that (GC) entails (BC), let S = pa(B) and A = V\S for an 
arbitrary subset B of V. Then the first relation in (BC) follows directly from the 
global Granger-causal Markov property. The second relation in (BC) can be derived 
similarly. □ 

Proof of Corollary \4. r l\ Suppose that all paths between A and B are p-blocked given 
S. We show that then all 5-pointing paths between A and B are p-blocked given 
SUB, which implies by the global Granger-causal Markov property that Xa is 
Granger-noncausal for Xb with respect to $aubus- 

We firstly note that, in particular, every 5-pointing path ir between A and B 
are p-blocked given S and, if 7r does not contain any intermediate nodes in B, also 
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p-blocked given S U B. Now, suppose that tt is a 5-pointing path between A and 
B with some intermediate nodes in B. Then tt can be partitioned as tt = (71*1,7^) 
where 7Ti is a path between A and some b £ B with no intermediate nodes in 5. 
Because of the assumption, the path tt\ is p-blocked given S and, since it has no 
intermediate vertices in B, also given S U B. It follows that all 73-pointing paths 
between A and B are p-blocked given S U B. 

The other two cases Xb Xa [^aubus] an d Xa <*> Xb [^aubus] can be derived 
similarly. □ 



Appendix D. p-separation in mixed graphs 

The definition of p-separation presented in this paper is based on paths that may 
be self-intersecting. Th is leads to simpler conditions than in the original definition 



by iLevitz et al.l (J2001). The latter is formulated in terms of paths on which all 



intermediate vertices are distinct, th at is, these paths ar e not self-intersecting; such 



paths are called trails. According to ILevitz et al.l ([200 if ), a trail between vertices a 
and h is said to be p-active relative to S if 

(i) every p-collider (head-no-tail node) on tt is in an(S'), and 

(ii) every p-noncollider v is either not in S or it has two adjacent undirected edges 
( — v — ) and p&(v)\S 7^ 0. 

Otherwise the trail is p-blocked relative to S. Let A, B, and S be disjoint subsets of 
V. Then S p-separates A and B if all trails between A and B are p-blocked relative 
to S. 

The following proposition shows that the two notions of p-separation are equiva- 
lent. 

Proposition D.l. Let G = (V,E) be a mixed graph and A, B, S disjoint subsets 
of V . Then there exists a p- active trail between A and B relative to S if and only 
there exists a p-connecting path between A and B given S . 

Proof. Suppose that 7r is a trail between two vertices a and b that is p-active relative 
to S. If all p-colliders on tt are in S and all p-noncolliders are outside S, then tt is 
also p-connecting given S. Otherwise, tt is p-blocked by vertices Uj l} . . . , Uj r on the 
path. If Uj i is a p-collider then Uj i G an(S') since tt is p-active. Hence there exists 
a directed path 7$ = {uj i — > ■ ■ ■ — > Sj) for some Si G S such that all intermediate 
vertices on 7$ are not in S and we set = (r^Tj). On the other hand, if Uj % is a 
p-noncollider on tt, then the two edges adjacent to Uj i are undirected. Thus, there 
exists Wi G p&(iij.)\S and we set <jj = (v,j. < — u>j — > u^). Now, let 7Tj be the subpath 
of tt between Uj i _ 1 and Uj t with Uj = a and Uj r+1 = b and set 

TT' = (tTOj 01) TTl, 02) • • • ) 7T r _i, CT r , TT r ) . 

Then all p-colliders on tt' are in S 1 and all p-noncolliders are not in S, which yields 
that tt' is p-connecting given S. 

Conversely, suppose that tt is a p-connecting path between a and b given S. Let 
Ujj be the first vertex on the path that occurs more than once. Then tt can be 
partioned as tt = (7^, Ai,7Ti) such that it^ is an endpoint, but not an intermediate 
node of tt' and tti- Noting that 7Tq is already a trail, we continue to partition tti in 
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the same way. After finitely many steps, we obtain the partition 

n = (K> Ai, 71^, A 2 , . • • , 7rr_x, A r , n' r ) 

such that the subpaths ttj are all trails. Thus, the shortened path tt' = (tt' , . . . , ir' r ) 
is also a trail. We show that tt' is a p-active trail relative to S. We firstly note that 
all subtrails tt'j are p-connecting and hence p-active. We therefore have to show that 
the vertices Uj i satisfy the conditions for a p-active trail. 

Suppose that Uj t is a p-collider that is not in S. Then at least one of the edges 
adjacent to Uj i has an arrowhead at Uj i and we may assume that is ttjv -pointing 
(otherwise consider the reverse path). Since Uj. ^ S, it must be a p-noncollider on 
tt and hence A, starts with a tail at Uj v On the other hand, since Uj t must be a 
p-noncollider on all its occurences on tt and tt^ does not start with a tail, the loop 
Aj cannot be a directed path (otherwise would not be a p-collider on (A,, 7^)). 
Consequently there exists an intermediate vertex Wi such that the subpath between 
Uj t and Wi is directed and Wi is a p-collider. It follows that Wi G S and Uj. G an(S'). 

Next, suppose that Uj i is a p-noncollider on tt' that is in S. Since has been 
a p-collider on 7T, the two edges adjacent to Uj. on tt' must be undirected and A; 
must be a bi-directed path. Hence A; is of the from Aj = (uj. < — Wi, A^) with Wi S 
(since Wi is a p-noncollider and 7r is p-connecting). Therefore, the set pa(M Ji )\5' is 
not empty and Uj i satisfies the above condition (ii). Altogether ot follows that tt' is 
p-active relative to S. □ 



In a remark on our simplified version of p-separation, iLevitz et al.l ([20011 ) argue 
that there are infinitely many possibly self-intersecting paths in a graph as opposed 
to finitely many trails. The following lemma shows that it is possible to restrict the 
search for p-connecting paths in G to a finite number of paths, namely all paths in 
which no edge occurs twice with the same orientation. 

Lemma D.2. Let G = (V, E) be a mixed graph and suppose that tt is a p-connecting 
path of the form it = (tti, e, 7r 2 , e, 7r 3 ) ; where e is an oriented edge between some 
vertices u and v. Then the shortened path tt' = (it\, e, tt 3 ) is also p-connecting. 

Proof. Since it is p-connecting, the two subpaths (tti, e) and (e, tt 3 ) are p-connecting. 
This implies that also tt' is p-connecting as every intermediate vertex has the same 
p-collider/noncollider status as in the corresponding subpath. □ 
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